Advanced Prompting Techniques

Chain-of-Thought (CoT): Making Reasoning Visible

Chain-of-Thought (CoT) is less common with reasoning models, since they already perform an explicit reasoning step. With SLMs and other non-reasoning models, however, CoT can still make a meaningful difference.That said, it’s still valuable to learn CoT techniques—they help you understand how these models think and how to effectively influence their behavior.

The Problem:

prompt = "What's 15% tip on a $47.83 bill?"
response = "$7.17"  # Correct

But what if you need to debug a wrong answer? You can’t see the reasoning. The Solution: CoT: Full runnable chain-of-thought notebook

prompt = """
Calculate 15% tip on a $47.83 bill.

Think step by step:
"""

response = """
Step 1: Convert 15% to decimal: 0.15
Step 2: Multiply: $47.83 × 0.15 = $7.1745
Step 3: Round to cents: $7.17

Answer: $7.17
"""

In Production:

Use CoT for complex reasoning; avoid for deterministic extraction/classification at temperature=0.
Consider privacy/compliance: avoid logging sensitive intermediate reasoning.
Cost/latency rise with longer outputs—use selectively.

Why It Works:

Often improves performance on reasoning tasks (magnitude varies by task/model)
Creates “intermediate tokens” that guide the model
Makes errors debuggable

Production Pattern:

def cot_prompt(question: str) -> str:
    return f"""
<question>{question}</question>

<instructions>
Solve this step by step:
1. Identify what information you need
2. Break down the problem into sub-steps
3. Solve each sub-step
4. Combine into final answer
5. Verify your answer makes sense
</instructions>

<thinking>
[Your step-by-step reasoning here]
</thinking>

<final_answer>
[Your final answer here]
</final_answer>
"""

Real-World Impact:

Code generation: 35% fewer bugs with CoT
Math problems: 50-70% accuracy improvement
Medical diagnosis: More reliable clinical reasoning

Self-Consistency: Voting for Reliability

The Problem: One response might be wrong due to non-determinism, ambiguous tasks, and/or valid solution paths. The Solution: Generate multiple responses and vote. Full runnable self-consistency notebook

async def self_consistent_answer(
    prompt: str,
    n: int = 5,
    temperature: float = 0.7
) -> str:
    """
    Generate multiple answers and return the most common one.
    """
    responses = []
    
    for _ in range(n):
        response = await llm.generate(
            prompt=prompt,
            temperature=temperature
        )
        responses.append(response)
    
    # Count occurrences (or use semantic similarity - more about this later -)
    from collections import Counter
    answer_counts = Counter(responses)
    
    # Return most common answer
    most_common = answer_counts.most_common(1)[0][0]
    
    return most_common

When to Use:

High-stakes decisions (medical, financial, legal)
Complex reasoning where errors are costly
Classification tasks where confidence matters

Cost Consideration:

5x Agent tasks = 5x cost
Use only when accuracy justifies expense

Performance Data:

CoT often improves performance on reasoning benchmarks; magnitude varies by task/model (see Wei et al., 2022)
Combining CoT + Self-Consistency can yield additional gains; magnitude varies by task/model (see Wang et al., 2022)
Always validate on your evaluation set; do not assume universal gains

Extended Thinking: Anthropic’s Secret Weapon

Claude-Specific Feature: Claude can expose its “thinking” before answering using special tags.

prompt = """
Analyze this complex legal document...
Think before you write the analysis report in <thinking> tags.
"""

Why This Matters:

Debugging: See where reasoning went wrong
Quality: Forces model to think before answering
Transparency: Clients can audit AI decisions

Thinking tags can also be used to guide Claude steps:

def analyze_contract(contract_text: str) -> dict:
    prompt = f"""
<document>
{contract_text}
</document>

<thinking>
I need to analyze this contract for:
1. Key obligations
2. Termination clauses
3. Liability limits
4. Red flags

Let me work through each section...
</thinking>

Provide a JSON response with:
- obligations: list of key obligations
- risks: list of potential risks
- recommendations: list of recommended actions
"""
    
    response = claude.generate(prompt)
    
    # Parse thinking section for audit trail
    thinking = extract_between_tags(response, "thinking")
    result = extract_json(response)
    
    return {
        "analysis": result,
        "reasoning": thinking,  # Store for compliance/review
    }

Prompt Chaining: Breaking Complex Tasks

Single Prompt Limitations:

Context window fills up
Errors compound
Hard to debug
Expensive to retry

Chaining Solution: Break one complex task into sequential simple tasks. Example: Customer Support Automation: Full runnable customer support automation notebook

async def handle_support_ticket(ticket: str):
    # Step 1: Classify urgency
    urgency = await classify_urgency(ticket)
    
    # Step 2: Extract key details (only if high urgency)
    if urgency == "high":
        details = await extract_details(ticket)
        
        # Step 3: Search knowledge base
        relevant_docs = await search_kb(details["issue"])
        
        # Step 4: Generate response
        response = await generate_response(
            ticket=ticket,
            docs=relevant_docs,
            urgency=urgency
        )
    else:
        # Low urgency: simpler path
        response = await generate_response(ticket)
    
    return response

Benefits:

Each step is simple → fewer errors
Failed steps can retry independently
Cheaper: Only call expensive steps when needed
Easier to evaluate and improve

Trade-off:

More latency (sequential calls)
More complex code
Multiple LLM calls (but often cheaper overall)

Home

Context Engineering & Prompt Design

Retrieval Augmented Generation (RAG)

AI Agents

Agent Reliability & Optimization

Multi-Agent Systems & Coordination

Advanced Prompting Techniques

Chain-of-Thought (CoT): Making Reasoning Visible

Self-Consistency: Voting for Reliability

Extended Thinking: Anthropic’s Secret Weapon

Prompt Chaining: Breaking Complex Tasks

Home

Context Engineering & Prompt Design

Retrieval Augmented Generation (RAG)

AI Agents

Agent Reliability & Optimization

Multi-Agent Systems & Coordination

​Chain-of-Thought (CoT): Making Reasoning Visible

​Self-Consistency: Voting for Reliability

​Extended Thinking: Anthropic’s Secret Weapon

​Prompt Chaining: Breaking Complex Tasks

Chain-of-Thought (CoT): Making Reasoning Visible

Self-Consistency: Voting for Reliability

Extended Thinking: Anthropic’s Secret Weapon

Prompt Chaining: Breaking Complex Tasks